Function Word Generation in Statistical Machine Translation Systems∗
نویسندگان
چکیده
Function words play an important role in sentence structures and express grammatical relationships with other words. Most statistical machine translation (SMT) systems do not pay enough attention to translations of function words which are noisy due to data sparseness and word alignment errors. In this paper, a novel method is designed to separate the generation of target function words from target content words in SMT decoding. With this method, the target function words are deleted before the translation modeling while in SMT decoding they are inserted back into the translations. To guide the target function words insertion, a new statistical model is proposed and integrated into the log-linear model for SMT, which can lead to better reordering and partial hypotheses ranking. The experimental results show that our approach improves the SMT performance significantly on ChineseEnglish translation task.
منابع مشابه
Enhancing Function Word Translation with Syntax-Based Statistical Post-Editing
The generation of precise and comprehensible translations is still a challenge in the patent and scientific domain. In particular, function words are often poorly translated in standard machine translation systems, particularly across language pairs with greatly differing syntax. In this paper we exploit the target-side structure in tree-totree machine translation to post-edit function words au...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملWord Graphs for Statistical Machine Translation
Word graphs have various applications in the field of machine translation. Therefore it is important for machine translation systems to produce compact word graphs of high quality. We will describe the generation of word graphs for state of the art phrase-based statistical machine translation. We will use these word graph to provide an analysis of the search process. We will evaluate the qualit...
متن کاملMeasure Word Generation for English-Chinese SMT Systems
Measure words in Chinese are used to indicate the count of nouns. Conventional statistical machine translation (SMT) systems do not perform well on measure word generation due to data sparseness and the potential long distance dependency between measure words and their corresponding head words. In this paper, we propose a statistical model to generate appropriate measure words of nouns for an E...
متن کاملLinguistic Heuristics in Word Alignment
The IBM statistical machine translation (SMT) models [Brown et al.1993] have been extremely influential in computational linguistics in the past decade. The (arguably) most striking characteristic of the IBM-style SMT models is their total lack of inherent linguistic knowledge. The IBM models demonstrated how much one can do with pure statistical techniques. This has inspired a whole new genera...
متن کامل